Extending Full Text Search Engine for Mathematical Content
نویسندگان
چکیده
The WWW became the main resource of mathematical knowledge. Currently available full text search engines can be used on these documents but they are deficient in almost all cases. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Most of these documents do not contain semantic information; therefore, precise mathematical interpretation is impossible. On the other hand, semantic information can help to give more precise information. In this work we address these issues and present a new technique how to search for mathematical formulae in real-world mathematical documents, but still offering an extensible level of mathematical awareness. It exploits the advantages of full text search engine and stores each formula not only once but in several generalised representations. Because it is designed as an extension, any full text search engine can adopt it. Based on the proposed theory we developed EgoMath—new mathematical search engine. Experiments with EgoMath over two document sets, containing semantic information, showed that this technique can be used to build a fully-fledged mathematical search engine.
منابع مشابه
The MCAT Math Retrieval System for NTCIR-10 Math Track
NTCIR Math Track targets mathematical content access based on both natural language text and mathematical formulae. This research describes the participation of MCAT group in the NTCIR math retrieval subtask and math understanding subtask. We introduce our mathematical search system that is capable of formula search, and full-text search. We also introduce our mathematical description extractio...
متن کاملMathWebSearch at NTCIR-10
We present and analyze the results of the MATHWEBSEARCH system in the NTCIR-10 Math pilot task, a challenge in mathematical information retrieval. MATHWEBSEARCH is a content-based search engine that focuses on fast query answering for interactive applications. It is currently restricted to exact formula search, i.e. no similarity search and no full-text search. As the MATHWEBSEARCH system has b...
متن کاملIntegrating RDF Querying Capabilities into a Distributed Search Infrastructure
The Semantic Web is inherently distributed, and covers both metadata and full-text information. Semantic search therefore can profit a lot from peer-to-peer infrastructures as well as from powerful metadata search functionalities based on full-text search technologies. In this paper we focus on an approach extending an existing P2P search infrastructure with RDF querying capabilities, which bot...
متن کاملThe impact of webpage content characteristics on webpage visibility in search engine results (Part I)
Content characteristics of a webpage include factors such as keyword position in a webpage, keyword duplication, layout, and their combination. These factors may impact webpage visibility in a search engine. Four hypotheses are presented relating to the impact of selected content characteristics on webpage visibility in search engine results lists. Webpage visibility can be improved by increasi...
متن کاملSHOW AND TELL: A Seamlessly Integrated Tool For Searching with Image Content And Text
In this paper, an image search tool that combines keyword and image content feature querying and search is presented. The developed search tool tries to bridge the gap between commercial search engines, which are based on keyword search, and CBIR (Content Based Image Retrieval) systems developed mostly in the academic field, designed to search based on image content. The tool is implemented by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008